The Basics

What is this project all about?

The simple answer is that you are going to tell a story with data. This is your opportunity to showcase the skills you’ve learned in this course!

Like I mentioned above, each person should spend at least 15 hours on this project. So, the story you tell has to be a bit complex.

I think the most important piece is choosing data or a research question that you are passionate about! Or, to reference Marie Kondo, one that sparks joy.

For example, I love cycling, and I have data on almost all my bike rides for the past 4 or 5 years. Wouldn’t it be cool to tell a story with those data?

I am also passionate about education. I think a lot about how schools are funded. Because I’ve been very involved in fundraising at my kids’ school, I am very curious about how PTO/PTA/etc. influence what types of resources schools are able to have. Listening to the “Nice White Parents” podcast this summer made me even more curious to investigate this. Maybe I could find some data regarding that ….

Where do I find data?

  • Like I mentioned above, FIRST find your passion.
  • Once you decide on a topic area, try to come up with some interesting research questions. For example, to use the bike data: Where do I bike most often? How fast do I go? Do I have a typical pattern in my long rides? Could I add some weather information to the data? … Where would I find that? … How would I add that? ….
  • Look for data. In my bike example, it’s pretty easy. The data are on my Garmin. But how do I get it out of there and into R? That’s a harder question, but I bet a bit of searching on the internet might lead you in the right direction. What if your data is not on a Garmin?
    • Try an internet search first. That might lead you to a good source.
    • Kaggle has a ton of data sets. I would not recommend going there first, though. FIRST, find your passion.
    • TidyTuesday data … but again, FIRST, find your passion.
    • If there is a certain subject area you are interested and you know a professor who studies that subject, you could ask them where to find some good data.
    • If there’s data you want from the internet that is not readily available, you could maybe scrape it from the web. We’ll learn some of these techniques very soon.
    • Come talk to me! If you have your passion, I can help you find the data!
    • Maybe a local small business or nonprofit would have some data you could use … this would be especially good if it is something you are passionate about!

What does my final project look like?

I am giving you quite a bit of flexibility in what your final project looks like, but I see three larger categories. The one requirement is that it is done completely within R Studio. Assume your audience consists of people that read pop-statistics and pop-computing blogs. You should assume that this audience is not familiar with your project but is comfortable with the fundamentals of data science.

  1. Technical blog post: similar to a paper, but a bit more casual and perhaps featuring more figures than you would typically include in a paper. We could find tons of examples. Here’s just a few:
  1. Shiny App: type of interactive dashboard that we’ll discuss soon. This option involves less writing but likely requires you to learn a little more coding on your own. In addition to creating the shiny app, you will also be required to submit a “User’s manual” that describes how someone would interact with the app. Check out some examples at the Shiny app gallery, show me shiny gallery, and flexdashboard gallery.

  2. Recorded presesentation: material-wise, similar to the technical blog post, but instead of laying it out like an article, you will create slides and talk about your results.

Details

Structure:

Title: In the YAML section of the document. A descriptive title & list of all group members.

Introduction and background: An introduction that motivates & outlines a clear, specific set of research questions. Also, provide some background on your topic.

Data collection: Specification of your data sources and collection process.

Analysis: This is the bulk of the report which either has a presentation of the group’s key findings and take-aways or gives the detail of how someone would interact and what people should take away from the shiny app. If you choose to do a shiny app, be sure to include a link to the shinyapps.io site.

Grading:

Structure & layout: The report follows the above structure and utilizes section/sub-section titles so that readers can easily navigate the report.

Storytelling & cohesion:

  • Goals & research questions are clear.
  • Findings are woven together in a cohesive story, rather than presented as a list of distinct ideas.
  • You don’t try to present everything you’ve done throughout the project. Rather, you pick the most insightful and cohesive aspects.
  • Your report showcases the entire life cycle of your project (eg: data collection to conclusions).

Results:

  • Your data visualizations / analyses are meaningful, ie. support the investigation of your research questions.
  • Your data visualizations / analyses are easy to interpret. (Not sure if this is a case? Hand off your report to at least one friend outside of the class and ask them to interpret your visualizations. If they cannot interpret your work, go back and make the appropriate changes.)
  • You thoroughly discuss any assumed notation / definitions and don’t use RStudio notation (eg: my_really_ugly_variable_name) in your text, visualizations, etc.

Code:

  • Unnecessary code and output (including errors) is eliminated. Like with your homework, include the options R code chunk at the top to eliminate messages and warnings. You should also eliminate the code by adding echo=FALSE to the options. You may want to wait until after your analysis is complete to include that section because you don’t want to miss those while you’re working on your analysis.
  • Minimize the amount of data that is printed to what is absolutely necessary for the reader, and make your tables look nice. Use the gt library functions!

Professionalism:

  • There are no grammatical errors / typos.
  • The knit html document doesn’t contain formatting errors.
  • Figures are appropriately sized, nicely laid out, and adeptly labeled.

A more detailed description of the project evaluations can be found here (I am still updating this).

How to hand in your assignment

Only one member of the group should hand in the final product. This could include knitted html files, videos or links to videos, slides (made in R), etc.

Every member of the group should individually describe what work each group member contributed to the final project in the text input field of their own Moodle assignment. If you or another member of the group contributed substantially more or less work overall, make this clear.